Dungeons and Dragons is a fantasy table-top role playing game. Players make choices to depict characters in a story, and the Game Master describes the world and the consequences of the players’ actions. Part of describing that world is bringing to life the horrible monsters that inhabit it. These monsters are represented by about 30 statistics that make each monster unique and meaningfully different to fight. These statistics are intended to make combat with the monsters feel responsive to their strengths and weaknesses. This report will use data scraped from this website, which has compiled by hand all of the officially released “statblocks” from the books published by Wizards of the Coast, the creators of Dungeons and Dragons.
I chose to create a database of the D&D monsters because I love this game and I wanted to highlight its mechanical approach to storytelling. It’s interesting how the designers choose to bring to life monsters as different as a Gold Dragon and an Awakened Tree using the same statistics.
The data scraped from the 900+ web pages was quite messy, and cleaning it was the hardest part of the project. Not all of the problems are notable enough to mention here, but these were the most significant challenges.
Most notably, one variable, “desc” contained information of the form: “var1 value1 var2 value2 var3 value3…”. The trouble I faced with extracting this data was that not every variable of interest was in every entry. My solution was to loop through the key variables I wanted to extract, separating the desc variable into two columns, the first labeled after the key variable, and the second lebeled “desc”, and containing all the remaining information. However, at the end of this loop, the data are not in the correct columns, but are instead shifted an indefinite number of columns to the right. Luckily, standing between the values and their correct positions are NA values, which can be “surfed” by checking each column right to left, to see if the column to its left is an NA value, and if so, shifting the value one column to the left. This process shifts the values to their correct columns. A similar method was used to separate the different types of speeds, such as flying, swimming, and climbing.
Another notable issue was the parsing of the Challenge Rating, or CR of the monsters. Parsing the fractions was returning inaccurate values, so I manually replaced the three possible fractions, 1/2, 1/4, and 1/8 with their decimal equivalents, and then parsed every other number. Many other variables were separated with stringr and then parsed as numbers.
The last significant challenge was creating columns to select monsters by their damage resistances, vulnerabilities, and immunities, as well as by the languages they speak, the senses they use, and the conditions they are immune to.
Besides these large challenges, many monsters have small caveats to the typical rules, which act somewhat like a baked potato in a car’s gas tank, breaking my original loop, as well as some of the patterns that I used to extract important data. Most of these were easy to locate, however, and the next largest issues were solvable with a line or two of code.
Of the over one hundred variables in the cleaned dataset, the rest of the report will use only 13. These variables are:
| Variable | Type | Description |
|---|---|---|
| name | String | The name of the monster |
| size | String | The categorical size of the monster |
| type | String | The type of creature the monster is |
| CR | Number | The monster’s Challenge Rating, or difficulty in battle |
| STR | Integer | The monster’s strength score |
| DEX | Integer | The monster’s dexterity score |
| CON | Integer | The monster’s constitution score |
| INT | Integer | The monster’s intelligence score |
| WIS | Integer | The monster’s wisdom score |
| CHA | Integer | The monster’s charisma score |
| image | String | The link to an image of the monster |
| psychic_imm | Logical | Whether or not the monster is immune to psychic damage |
| Elvish | Logical | Whether or not the monster speaks Elvish |
The outcome variables we will be investigating will be the six core statistics of each monster, which range from 0 to 30. These stats will be represented by their in-game abbreviations: STR for Strength, DEX for Dexterity, CON for Constitution, INT for Intelligence, WIS for Wisdom, and CHA for Charisma. We will investigate their connections to a monster’s size, language proficiency, and its creature type. And at the end, we’ll see how the stats are correlated with each other! Let’s delve.
We begin with a look at the univariate distributions of each of the core stats. Get used to this six-graph format for the variables, as we’ll see it come up again later. The statistics for these distributions are in the table below, but a key feature of these distributions is the soft minimum for some of the stats at a value of 10, which is considered the average. Apparently, most monsters that adventurers encounter are considered to be of above average Constitution, Dexterity, and Wisdom. (Wisdom, it should be noted, refers more to cleverness or awareness than book-smarts.) However, this minimum does not hold for Intelligence, Charisma, and to some extent, Strength. This implies that a good portion of monsters are of below average Intelligence, representing book-smarts, and Charisma, representing force of will. By contrast, this paints the players as knowledgeable and purposeful, defeating more powerful foes by clever strategy and sheer force of will.
| Statistic | Mean | Standard Deviation |
|---|---|---|
| STR | 16.02 | 6.49 |
| DEX | 13.33 | 3.45 |
| CON | 16.15 | 4.64 |
| INT | 10.27 | 6.17 |
| WIS | 12.78 | 3.83 |
| CHA | 11.57 | 6.03 |
Now we’ll see a few of the many interesting connections in this data set.
The first of which is the correlation between a monster’s size and its Strength score.
Clearly, the categorical size of a monster is one of the biggest predictors of its Strength score. Gargantuan monsters have a median Strength of 27, while Tiny mosters have a measly median of 3. With scores ranging 0 to 30, that’s about as drastic a difference as we can get.
Conversely, drawing the same graph again, but this time observing the Dexterity scores, we actually see a decrease in median dexterity as the monsters increase in catergorical size. This makes sense in the context of the “big, slow bad guy” trope, where the heroes quickly flit around the groggy enemy, dodging its lethal attacks. In this case, Gargantuan monsters have a median Dexterity of 12, while Tiny monsters have a slightly higher median of 15. Though that’s a small difference in the bookends, the consistent decrease in median as we move up the size categories implies the same correlation as well.
The graph above illustrates the difference in Intelligence scores between monsters who speak or understand Elvish, and those who don’t. This difference holds true for almost any language, but it’s most pronounced in Elvish. The median intelligence for Elvish speakers is 14, while the median for those who can’t understand or speak Elvish is 10.
As a fun fact, at my D&D table, Elvish is Greek when written, and French when spoken. The idea behind this decision is that human minds can’t comprehend how the language is read, so the words and spoken tongue should seem wildly disconnected to the players.
## Picking joint bandwidth of 2.85
Sorting the monsters by whether or not they have an immunity to psychic damage, and then comparing their Intelligence scores, we see a bimodal distribution emerge in the psychic-immune group. My best hypothesis as to the cause of this effect is that there are two types of enemies who may be immune to psychic damage: intellectual fortresses, who have developed their minds to be immune to mental manipulation, and the empty minds, those with nothing up there to damage. Imagine the difference between trying to read the mind of Gandalf vs Patrick Star.
One of the strongest predictors of a monster’s Charisma score is its creature type, a few of which are visualized above. There are 15 creature types, but only five are presented above. Celestials, the D&D equivalent of angels, have a median Charisma score of 20, the maximum score for a player character, and Plants have a median Charisma score of 5.
The strongest predictor of Wisdom score is a monster’s Challenge Rating, or CR, which represents how difficult it is to defeat in combat. Since this number is an indicator of the monster’s fighting capabilities, we can consider this graph an indication of how the Wisdom score scales as the monsters become more powerful. We can create similar graphs for every core statistic.
The darkness of a point in these graphs indicates the number of entries at that point, with the darkest points representing 10 or more entries. We can see that most of the statistics follow a trend similar to the Wisdom score, scaling up with Challenge Rating. Dexterity, however, appears to be the exception, barely scaling at all as CR increases. Intelligence is also quite a weak positive correlation. Apparently, Intelligence and especially Dexterity are not the most key factors in determining a monster’s difficulty in battle.
To conclude this report, we’ll investigate the correlations between the core stats. Above is the plot of Strength vs Dexterity, with a correlation coefficient of -0.1, indicating a weak negative correlation if anything at all. It appears that monsters may be any combination of dexterous and mighty, but in general, strength in one attribute may predict a weakness in the other.
Conversely, observing the relationship between Constitution and Strength yields a correlation coefficient of 0.84, indicating a strong positive correlation. A monster that is mighty will likely be tough to bring down. Rather than examine each of these graphs individually, we can create a grid of them, keeping the axes constant along vertical and horizontal lines.
Though this graph may seem overwhelming at first, it may help to recognize the last two graphs we looked in the first and second positions in this grid. Since they share the Strength score as their y-axis, they are in the first row, where every scatterplot’s y-axis is Strength. Likewise the y-axes are held constant in the other rows, and the x-axes are held constant in the columns. A helpful intuition for this grid is that scatter plots that picture a dark line imply a strong correlation, such as the graph at the intersection of Intelligence and Charisma, and a pale scattering of points indicates weak or no correlation, such as the intersection of Intelligence and Strength. Feel free to study the grid for a second or two and speculate as to why certain attributes may be more or less correlated.
The most profound takeaways from these multivariate and univariate analyses are that the mental stats (INT, WIS, CHA) are largely correlated, and most core stats can be predicted by some other attribute of the monster, such as size, creature type, or damage immunities. These are the results I was expecting, and it’s cool to see them visualized like this.
Thanks for reading about my database and my hobby! I hope you found the mechanical storytelling of the game interesting, but most importantly, I hope you enjoyed reading.
These are not necessarily parts of the formal report, but if I have this databse, I may as well have fun with it!
Since I imported the image links of these monsters, let’s take a look at a few of them! Here is a table of three random monsters and their core stats, as well as their pictures, in order of the table
| name | size | type | STR | DEX | CON | INT | WIS | CHA | CR | |
|---|---|---|---|---|---|---|---|---|---|---|
| 301 | Eyedrake | Large | Aberration | 16 | 10 | 16 | 14 | 14 | 16 | 8.00 |
| 823 | Venom Troll | Large | Giant | 18 | 13 | 20 | 7 | 9 | 7 | 7.00 |
| 748 | Steam Mephit | Small | Elemental | 5 | 11 | 10 | 11 | 10 | 12 | 0.25 |
Over the course of this project, I made many graphs that I deemed extraneous, confusing, or downright useless in the final edit, but I’d still like to share some of the prettiest ones in an art for art’s sake kind of way. Some of these graphs are unreadable, and provide little insight into the database, but I still find them pretty.
## Picking joint bandwidth of 2.81
## Picking joint bandwidth of 2.81
## Picking joint bandwidth of 1.8
## Picking joint bandwidth of 1.19
## Picking joint bandwidth of 2.85
## Picking joint bandwidth of 2.49
## Picking joint bandwidth of 1.69